Defining Knative Applications as Code
Learn how to define the Knative applications as code.
Viewing definitions#
Executing commands like kn service create is great because it’s simple. But it’s the wrong approach to deploying any type of application, Knative included. Maintaining a system created through ad hoc commands is a nightmare. The initial benefits from that approach are often overshadowed by the cost that comes later.
So we’ll move on with the assumption that you want to have a YAML file that defines your application. It could be some other format but, given that almost everything is YAML in the Kubernetes world, we’ll assume that’s what you need.
So, let’s take a look at how we’d define our application. We’ve already prepared a sample definition devops-toolkit.yaml to use.
That definition could be shorter. If we want to accomplish the same result as what we had with the kn service create command, we don’t need the annotations and the resources section. But we wanted to show that we can be more precise. That’s one of the big advantages of Knative. It can be as simple or as complicated as we need it to be. But we don’t have time to go into details of everything we might (or might not) want to do. Instead, we’re trying to gain just enough knowledge to decide whether Knative is worth exploring in more detail and potentially adopting it as a way to define, deploy, and manage some (if not all) of our applications.
The annotations tell Knative that we want to scale to 0 replicas if there's no traffic and that there should never be more than three replicas. For example, we could choose never to scale below 2 replicas and go way above 3. That would give us scalability and high availability without making our applications serverless, without scaling down to zero replicas.
The containerConcurrency field is set to 100, meaning that, in a simplified form, there should be one replica for every hundred concurrent requests while never going above the maxScale value.
The image, ports, and resources fields should be self-explanatory since those are the same ones we would typically use in, let’s say, a Deployment.
There are also some limitations we might need to be aware of. The most important one is that we can have only one container for each application managed by Knative. If we try to add additional entries to the containers array, we’d see that kubectl apply will throw an error.
That’s it. Let’s apply that definition and see what we’ll get.
Viewing the application#
We created a single resource. We didn’t specify a Deployment nor did we create a Service. We didn’t define a HorizontalPodAutoscaler. We didn’t create any of the things we usually do. Still, our application should have all those and quite a few others. It should be fully operational, it should be scalable, and it should be serverless. Knative created all those resources, and it made our application serverless through that single short YAML definition. That’s a very different approach from what we typically expect from Kubernetes.
Kubernetes is, in a way, a platform to build platforms. It allows us to create very specialized resources that provide value only when combined together. An application runs in Pods, Pods need ReplicaSets to scale, ReplicaSets need Deployments for applying new revisions. Communication is done through Services. External access is provided through Ingress. And so on and so forth.
Usually, we need to create and maintain all those and quite a few other resources ourselves. So, we end up with many YAML files, a lot of repetition, and a lot of definitions that aren't valuable to end users but are instead required for Kubernetes’ internal operations. Knative simplifies all that by requiring us to define only the differentiators and the things that matter to us. It provides a layer on top of Kubernetes that, among other things, aims to simplify the way we define our applications.
We’ll take a closer look at some (not all) of the resources Knative created for us. But, before we do that, let’s confirm that our application is indeed running and accessible.
We’ve seen a similar result before. The major difference is that, this time, we applied a YAML definition instead of relying on kn service create to do the work. As such, we can store that definition in a Git repository. We can apply whatever process we use to make changes to the code, and we can hook it into whatever CI/CD tool we are using.
Now, let’s see which resources were created for us. The right starting point is kservice since that’s the only one we created. Whatever else might be running in the production, namespace was created by Knative and not us.
The output is as follows.
As we already mentioned, that single resource created quite a few others. For example, we have revisions. But, to get to revisions, we might need to talk about Knative Configuration.
The output is as follows.
The Configuration resource contains and maintains the desired state of our application. Whenever we change Knative Service, we are effectively changing the Configuration, which creates a new revision.
The output is as follows.
Each time we deploy a new version of our application, a new immutable revision is created. It’s a collection of almost all the application-specific resources. Each has a separate Service, a Deployment, a Knative PodAutoscaler, and, potentially, a few other resources. Creating revisions allows Knative to decide which request goes where, how to rollback, and a few other things.
Confirming the deployment#
Now that we mentioned Deployments, Services, and other resources, let’s confirm that they were indeed created. Let’s start with Deployments.
The output is as follows.
Deployment is indeed there. The curious thing is that 0 out of 0 replicas are ready. Since it’s been a while since we interacted with the application, Knative decided that there’s no point running it. So it scaled it to 0 replicas. As we already saw, it will scale back up when we start sending requests to the associated Service. Let’s take a look at them as well.
The output is as follows.
We can see that Knative not only created Kubernetes Services but also Istio VirtualServices. Since we told it that we want to combine it with Istio, it understood that we need not only Kubernetes core resources but also those specific to Istio. If we choose a different service mesh, it would create whatever makes sense for it.
Further on, we get the PodAutoscaler.
The output is as follows.
The PodAutoscaler is in charge of scaling the Pods to comply with the changes in traffic, or whatever other criteria we might use. By default, it measures the incoming traffic, but it can be extended to use formulas based on queries from, for example, Prometheus.
Finally, we got a Route.
The output is as follows.
The Routes are mapping endpoints (e.g., a subdomain) to one or more revisions of the application. They can be configured in quite a few different ways, but in their essence, they’re the entities that route traffic to our applications.
Load testing of application#
We’re almost finished. There’s only one crucial thing left to observe, at least from the perspective of a quick overview of Knative. What happens when our appliation must handle multiple requests at once? We saw that when we don’t interact with the app, itns scaled down to 0 replicas. We also saw that when we send a request to it, it scales up to 1 replica. But what happens if we start sending five hundred concurrent requests? Take another look at devops-toolkit.yaml and make a guess. It shouldn’t be hard.
We’ll use Siege to send requests to our application. To be more specific, we’ll use it to send a stream of five hundred concurrent requests over sixty seconds. We’ll also retrieve all the Pods from the production namespace right after Siege is finished.
Like before, the commands will differ slightly depending on the Kubernetes platform you’re using.
Note: You won’t be able to use Siege with Docker Desktop. That shouldn’t be a big deal since the essential thing is the output, which you can see here.
Note: Please use the command that follows if you’re using minikube or EKS.
The output, in our case, is as follows.
We can see that over forty thousand requests were sent, and the availability is 100.00%. That might not always be the situation, so don’t be alarmed if, in your case, it’s a slightly lower figure. Your cluster might not have enough capacity to handle the increase in workload and might need to scale up. In such a case, the time required to scale up the cluster might have been too long for all the requests to be processed. You can always wait for a while for all the Pods to terminate and try again with increased cluster capacity.
Note: For now, Knative does not give 100% availability. If you have huge variations in traffic, you can expect something closer to 99.9% availability. But that’s only when there’s a huge difference like the one we just had. Our traffic jumped from zero to a continuous stream of five hundred concurrent requests within milliseconds. For the “normal” usage, it should be closer to 100% (e.g., 99.99%) availability.
What truly matters is that the number of Pods was increased from zero to three. Typically, there should be five Pods since we set the containerConcurrency value to 100, and we were streaming 500 concurrent requests. But we also set the maxScale annotation to 3, so it reached the limit of the allowed number of replicas.
In the time it takes to read this, Knative has probably already started scaling down the application. It probably scaled it to one replica, to keep it warm in case new requests come in. After a while, it should scale down to nothing (zero replicas) as long as traffic remains absent.
The vital thing to note is that Knative does not interpret traffic based on the current metrics. It will not scale up when the first request that cannot be handled with the existing replicas kicks in. It will also not scale down to zero replicas the moment all requests stop coming in. It changes things gradually, and it uses both current and historical metrics to figure out what to do next.
The number of Pods should have dropped to zero by now. Let’s confirm that.
The output states that “no resources” were found in production namespace. In your case, a Pod might still be running, or the status might be terminating. If that’s the case, wait for a while longer and repeat the previous command.
There are many aspects of Knative that we didn’t explore. This chapter’s goal was to introduce Knative so you can see whether it’s a tool worth investing in. We tried to provide as much essential information as we could while still being quick and concise. For more information about Knative, please visit its documentation.
Note: We'll get hands-on experience with the concepts and commands discussed in this lesson in the project "Hands on: Using Self-Managed Containers as a Service on AWS" right after this chapter.
Load testing for AKS or GKE#
Run the following command for load testing if you are using GKE or AKS.
Painting the Big Picture
Destroying the Resources